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ABSTRACT 

I compare various semi-analytic models for the bias of dark matter halos with halo 
clustering properties observed in recent numerical simulations. The best fitting model 
is one based on the collapse of ellipsoidal perturbations proposed by Sheth, Mo & 
Tormen (1999), which fits the halo correlation length to better than 8 per cent ac- 
curacy. Using this model, I confirm that the correlation length of clusters of a given 
separation depends primarily on the shape and amplitude of mass fluctuations in the 
universe, and is almost independent of other cosmological parameters. Current obser- 
vational uncertainties make it difficult to draw robust conclusions, but for illustrative 
purposes I discuss the constraints on the mass power spectrum which are implied by 
recent analyses of the APM cluster sample. I also discuss the prospects for improving 
these constraints using future surveys such as the Sloan Digital Sky Survey. Finally, I 
show how these constraints can be combined with observations of the cluster number 
abundance to place strong limits on the matter density of the universe. 
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1 INTRODUCTION 

Recent years have seen dramatic increases in the size of the 
largest cosmological N-body simulations, with giga-particle 
runs now able to track the non-linear gravitational evolu- 
tion of a significant fraction of the observable universe (Col- 
berg et al. 1998). Correspondingly, it is becoming possible 
to make more and more detailed predictions for the non- 
linear properties of dark matter in any cosmological model, 
with convergent results being obtained for such features as 
the profiles and correlation functions of dark matter halos 
(Moore et al. 1999). In spite of these advances, the one great 
challenge remaining for the simulators is to realistically track 
the evolution of cosmic gas through its complex cycles of 
star-formation and feedback. 

Given the current state of the art for cosmological sim- 
ulations, rich clusters of galaxies provide one of the most 
powerful tools with which to make detailed tests of cosmol- 
ogy. Firstly, cluster potential wells are so deep that their for- 
mation and evolution is largely unaffected by the uncertain 
history of the cosmic gas, making dissipationless N-body 
simulations an ideal place in which to study their properties. 
Second, clusters correspond to rare peaks in the primordial 
density field, so their statistics are extremely sensitive to 
detailed features of the cosmology. 
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In this paper, I will show that a simple model proposed 
by Sheth, Mo & Tormen (1999) can give an extremely good 
fit to the correlation function of massive halos observed in re- 
cent state of the art numerical simulations. Using this model 
I will compare the predictions of a range of cosmological 
models with observations of the cluster correlation length. 
In particular, I will show that correlation length observa- 
tions place strong constraints on the power spectrum of mass 
fiuctuations in the universe, constraints that are almost in- 
dependent of other cosmological parameters. In section ^ I 
describe the Sheth, Mo & Tormen model for the cluster cor- 
relation length, together with a similar semi-analytic model 
due to Mo & White (1996), and other bias models discussed 
in the literature. In section ^ I compare the predictions of 
these models with recent numerical simulations, and in sec- 
tion ^ I show the constraints which can be derived when the 
models are compared with observations. In section ^ I dis- 
cuss the relationship of this work to previous studies, and 
draw my conclusions. 



2 MODELS FOR THE CORRELATION 
LENGTH 

I describe four semi-analytic models which make predictions 
for the amplitude of halo correlations, one due to Mo & 
White (1996 - hereafter MW), one due to Sheth, Mo & Tor- 
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men (1999 - hereafter SMT), one due to Sheth and Tormen 
(1999 - hereafter ST), and one based on a model by Lee & 
Shandarin (1998 - hereafter LS) and extended in ST. The 
input parameters for these models are the power spectrum 
of matter fluctuations P{k) (extrapolated using linear grav- 
ity to today) and the background cosmology of the universe 
(specified here in terms of f2m and JIa, the contributions 
of matter and a cosmological constant to the critical den- 
sity today). 1 will compare the predictions of these models 
with observations of the halo correlation length ro{d), de- 
fined such that ^(ro) = 1 where ^(r) is the halo correlation 
function. Also, d is the mean halo separation for the halo 
sample in question, and the sample is chosen to contain only 
those halos with "richness" greater than some value. In this 
context 1 use "richness" (TV) to denote any parameter which 
correlates with halo mass (typical examples for clusters are 
galaxy number counts and X-ray luminosity). 1 will assume 
that the parameter R can be monotonically transformed into 
an "inferred mass" M = /(TV) which is correlated with the 
true mass with known conditional probability p{A4\M). 

For each model, the procedure for computing the cor- 
relation length for a sample of halos with mean separation 
d is as follows. 

• First, compute the inferred mass threshold A4, such 
that the mean separation of halos with inferred mass greater 
than M is equal to d. That is, find M satisfying 



(1) 



where A^x(-^) is the number density of halos with inferred 
(as denoted by the superscript I) mass greater than A4 for 
model X, and X denotes the semi-analytic formalism being 
used (either MW, SMT, ST or LS). Now, 



Nk{M) = / nx{M)dM 



(2) 



where n]s({M)dM is the number density of halos with in- 
ferred masses between A4 and A4 + dA4, and 



nx{M)= / dMp{M\M)nx{M) 



with 



nx{M) = -vfx{u) 



p da{M)/dM 
M a{M) 



(3) 



(4) 



being the number density of halos with true mass between 
M and M + dM. Here v = 5c/a{M), a{M) is the rms linear 
fiuctuation in a sphere containing an average mass M, p is 
the background density of the universe, and 5c is the crit- 
ical overdensity for collapse as computed in the spherical 
collapse model - see Kitayama & Suto (1996) for fits to the 
weak cosmological dependence. The quantity cr(M) can be 
straightforwardly calculated at any redshift knowing P(k) 
and the background cosmology. The function fx is given by 



for the MW model, by 



(5) 



(6) 



for the SMT and ST models, with A ~ 0.3222, q = 0.3, and 
a — 0.707, and by equation Al of ST for the LS model. 

• Calculate the effective bias parameter for halos 

with an inferred mass greater than M, satisfying 



^>M = T7TTT7^ hx(M)nx(M)dM 



where 



bx{M) 



J dMp{M\M)f4bx{M))bx{M)n{M) 
/ dMp{M\M)n{M) 



(7) 



(8) 



and bx{M) is the bias parameter for halos with true mass 
M, given by 

6mw = 1 + -^(i^^ - 1) 

Oc 

for the MW model, by 
&SMT = 1 + -^^^ ya^'^v^ + ^/ah(av) 



2(l-c) 



(9) 
(10) 

(a!/)2=-f b(l-c)(l-c/2) ''^^^ 
for the SMt|T] model with a = 0.707, h = 0.5 and c = 0.6, by 



&S1 



1-H 



{av^ - 1) + 



2g 



1 + (av-^f 



(12) 



for the ST model with a and q given above, and by equation 
A3 of ST for the LS model. Lastly, is an optional factor 
which may be included to account for the effect of redshift 
space distortion if the prediction is to be compared with 
observations in redshift space. Following Kaiser (1987) /z is 
given by 



/,(6) = l-K2/?/3 + /3V5 



(13) 



where /3 = Ih. 

• Solve for the correlation length ro satisfy- 
ing (6>A4)^CNL(ro) = 0, where 5NL(r) is the non-linear mass 
correlation function, given by the Fourier transform of the 
non- linear power spectrum PNL(fcNL). The non linear power 
spectrum is computed using the method described by Pea- 
cock & Dodds (1996) using the linear power spectrum P(fc) 
as input. For comparison, we also compute ro using the lin- 
ear correlation function, that is we solve for ro satisfying 
(^>ai)^Cl('"o) ~ 1, where ^L(r) is the Fourier transform of 
the linear power spectrum P{k). As we will see in section ^ 
use of the non-linear correlation function gives a better fit 
for all the models. Unless otherwise specified, all correlation 
lengths have been computed using the non-linear correlation 
function. 



3 COMPARISON WITH SIMULATIONS 

The models just described make predictions for the halo two- 
point correlation function on any length scale r. Since the 
shape of the halo correlation function is not well constrained 

t The functions /smt('^) and fesMT(t') given here are just /{u') 
and feEul('^') from equations 5 and 8 of SMT respectively, with 
u' = ^/av. As discussed in SMT, this rescaling ensures that 
the halo mass function agrees with that observed in numerical 
simulations. 
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Figure 1. Correlation length versus halo separation as observed 
in numerical simulations (datapoints) together with the predic- 
tions of the SMT (solid curves) and MW (dotted curves) models. 
The SMT predictions agree very closely with the results of the 
simulations, while the MW predictions are typically 25 per cent 
too high. 



by current observations, most authors have focused just on 
the amplitude, quantified in terms of the correlation length 
ro. I now compare the predictions of each of the models 
for ro with the values observed in large coUisionless N-body 
simulations (comparison between models and simulations for 
the full halo correlation function is beyond the scope of the 
current paper, but deserves future investigation). The sim- 
ulations have been carried out by Colberg et al. (1998 - 
hereafter C98) and Governato et al. (1999 - hereafter G99). 
For each simulation the power spectrum has been taken to 
have a CDM form (Bardeen et al. 1986, eq. G3) parameter- 
ized by a shape parameter F (where F ~ flh, and h is the 
Hubble constant in units of 100 kms~^Mpc^^) and normal- 
ization as, where erg is the rms fluctuation in an 8/i^^Mpc 
sphere. The power spectrum parameters for each simulation 
are summarized in Table |l|, together with the box length 
L, particle number Np and background cosmology. Fig. ^ 
compares the ro — d relation observed in the simulations 
(data points) with that predicted by the MW (dotted lines) 
and SMT (solid lines) formalisms. In computing these pre- 
dictions, I use p{M\M) — S{M — M) where 6 denotes the 
Dirac delta function, since in this case the richness property 
by which the halos are ranked is just the true mass. 



Model 


i?-(?NL) 


E- (a) 


MW 


0.22 


0.24 


SMT 


0.08 


0.10 


ST 


0.11 


0.12 


LS 


0.11 


0.14 



Table 2. Goodness of fit for each of the bias models discussed in 
section ^. The fit parameter E is defined in section ^. The first 
and second columns give the fit obtained using the non-linear and 
linear correlation function respectively. 



Clearly the SMT model gives much better agreement 
with the cluster correlation length as observed in the sim- 
ulations. I quantify the goodness of fit by defining an rms 
error E via 



N 



1 

iV ^ 



(14) 



where the sum i runs over all datapoints (d\ro) under con- 
sideration, and r^{d) denotes the correlation length predic- 
tion for model X at separation d. For the combined dat- 
apoints from all four simulations, the SMT model gives 
E — 0.08, while errors for the MW model are much larger, 
with E — 0.22. The goodness of fit for each of the mod- 
els is given in Tab ^ The left column shows results using 
the non- linear correlation function to compute ro, while the 
right column shows results using the linear correlation func- 
tion. Use of the non-linear correlation function improves the 
fit in each case. 

I conclude from this analysis that the SMT model gives 
a good fit (typical errors less than 8 per cent) to the halo 
correlation length arising from a full treatment of the non- 
linear gravitational evolution. The MW model does worst 
of all the models considered, with typical errors of order 
25 per cent. In particular, the MW model systematically 
overestimates the halo correlation length for fixed separation 
d, a result also found in C98. It is worth noting that the 
discrepancy between the MW model and the simulations is 
much larger than might be inferred from previous studies. 
For instance, Mo, Jing & White (1996 - hereafter MJW) and 
Jing (1999) suggest that the MW formula agrees at better 
than the five per cent level with the rare halo bias observed 
in their simulations. Closer examination of the rarest mass 
datapoints in Fig. 3 of Jing (1999) illustrate that the error is 
actually much larger. Figs. 7 and 8 of Governato et al. also 
imply extremely good agreement between the MW formula 
and correlation lengths observed in their simulations. In fact, 
the MW prediction has been computed incorrectly in these 
figures, and the true agreement is much worse (as illustrated 
in Fig. |l|of this paper). A final source of confusion as to the 
accuracy of the MW formula is that the curves showing the 
MW prediction in Fig. 8 of MJW have also been computed 
incorrectly, with the correct values for the correlation length 
being up to 25 per cent higher in some places. To summarize, 
there is clear evidence that the MW formula significantly 
overestimates the expected bias of the rarest halos, while 
the SMT model gives extremely good agreement. 
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Model 


Source 




Ha 


0-8 


r 


Np 


L/{h-^Mpc) 


SCDM07 


G99 


1.0 


0.0 


0.7 


0.5 


47 X 10*^ 


500 


03CDM 


G99 


0.3 


0.0 


1.0 


0.225 


47 X 10*^ 


500 


rCDM 


C98 


1.0 


0.0 


0.6 


0.21 


109 


2000 


LCDM 


C98 


0.3 


0.7 


0.9 


0.21 


109 


3000 



Table 1. Summary of parameters for each of the simulations discussed in section 



4 COMPARISON WITH DATA 

I now make use of the SMT model to compare the pre- 
dictions of a range of cosmologies with observational data. 
Fig. ^ shows the ro — d relation for the APM cluster survey, 
as analyzed by Croft et al. (1997 - hereafter C97) and Lee & 
Park (1998 - hereafter LP98). Although this is only a smaU 
fraction of currently available data on the cluster correlation 
length, it suffices to illustrate the wide degree of systematic 
uncertainty which exists in present observations - even two 
analyses of the same dataset give results that differ by con- 
siderable factors, and the situation is even less clear when 
results from Abell (see for instance Bahcall & Cen 1992) 
and X-ray (see LP99 for a detailed discussion) cluster sam- 
ples are included. Although present day uncertainties are 
large, errors in ro will be greatly reduced by future surveys. 
The Sloan Digital Sky Survey (SPSS, see project book at 



http:/ /www. astro. princeton.edu/PBOOK/welcome. htm for 



detailed specifications) is expected to identify roughly 1000 
nearby clusters in its spectroscopic galaxy redshift survey, 
and even more in its photometric survey. The inclusion of 
redshift information in the cluster identification procedure 
will greatly reduce systematic uncertainties in ro, and sta- 
tistical uncertainties will be reduced by approximately 50 
per cent by the increased sample size. In order to illustrate 
the type of results we can hope for, I will carry out the bulk 
of my analysis using just the C97 data. This data is at least 
self-consistent, and 2a confidence limits for the SDSS sur- 
vey should be similar in size to la confidence limits from 
the C97 analysis. 

I compute confidence limits for various models as fol- 
lows: Each model gives a specific prediction for the for the 
ro — d relation, given which the likelihood^ of observing a 
particular dataset is (up to a constant of proportionality) 



where 



^2 _ \ " 



[^0 



(15) 



(16) 



Here the sum i runs over all datapoints (di,ro) with standard 
deviation a;, i? is the fractional uncertainty in the predic- 
tions of the model (which following the discussion in sec- 
tion I is taken to be £ = 0.08) and the term (^Erf{d^)Y 
is added in quadrature to the denominator to account for 
systematic uncertainties in the model. Given a set of model 

■f This analysis assumes that the datapoints are statistically in- 
dependent. While this should be a good approximation for the 
datasets considered, the effect of correlations should ultimately 
be taken into account in a more detailed treatment. 




Figure 2. Cluster correlation length versus halo separation as 
inferred by Croft et al. (1997 - solid squares) and Lee and Park 
(1999 - open triangles) from an analysis of the APM cluster sam- 
ple. The solid line shows the SMT prediction for a model with 
Um = 0.3, r = 0.17 and as = 1.0. 



parameters and prior distributions for those parameters, I 
calculate la and 2a confidence limits by computing the like- 
lihood threshold above which the integrated likelihood ac- 
counts for 69 and 95 per cent respectively of the total prob- 
ability. Finally, all model predictions are computed at the 
median redshift of each sample, which I take to be z = 0.08, 
although the precise value used has little effect on the re- 
sults. 

Fig. 1^ shows confidence limits in the F — as plane de- 
rived from the observations shown in Fig. |^. Given values 
for ag and F, each model is fully specified once we know Qm 
and the conditional probability distribution relating the "in- 
ferred" mass for the cluster observations in question to the 
true cluster mass. The shaded area labeled "Croft" shows la 
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Figure 3. Confidence limits in the F — erg plane derived from 
cluster correlation length data. The shaded area labeled "Croft" 
shows Icr (dark band) and 2cr (light band) limits derived from 
the fiducial analysis utilizing the Croft data and assuming the 
SMT model for cluster correlations, in an Q^n = 0.3, = 0.7 
cosmology with zero scatter between true and inferred cluster 
mass. Each of the other four bands show the results of changing 
one factor in the fiducial analysis; using the Lee &; Park (1999) 
analysis of the APM clusters (shaded area labeled Lee) , using the 
MW correlation length formula (shaded area labeled MW), using 
a flat flm = 1.0 cosmology (area delineated by long-dashed lines), 
and finally using a 50 per cent scatter between true and inferred 
cluster mass (area delineated by short-dashed lines). 

(dark region) and 2a (light region) confidence limits derived 
using the SMT model and the C97 data, in an flm = 0.3 
background cosmology with zero scatter between inferred 
and true cluster mass (use of an open flm ~ 0.3 cosmol- 
ogy makes virtually no difference to the results) . Four other 
confidence regions are also shown in this figure, each one the 
result of changing one aspect of the first analysis. First, the 
shaded area labeled "MW" gives the Icr confidence region 
which results from using the MW model rather than the 
SMT model. The limits in this case are quite different, and 
given the poor performance of the MW model when com- 
pared to simulations, should be disregarded. Second, the 
shaded region labeled "Lee" shows the la limits resulting 
from using the LP99 data (with its higher values for ro) 
rather than the C98 data. The LP99 data favours dramat- 
ically lower values of F for a given value of as, and since 
the majority of alternative datasets (see LP99 for a detailed 
discussion) also prefer higher values of ro, the high F re- 
gion to the right of the "Croft" confidence limits is likely to 
be excluded by all current observations. Thirdly, the long- 
dash lines delineate the la confidence region resulting from 
an analysis identical to the "Croft" analysis except for the 
choice of a fiat ilm = 1 cosmology. Increasing Qm slightly 
increases the amplification of clustering by redshift space 
distortion, and slightly reduces the growth factor at the me- 



dian redshift of the sample. Both of these effects are small, 
and the resulting confidence region is very similar to the la 
limits resulting from the fim =0.3 analysis. Lastly, the short 
dashed lines delineate la confidence regions obtained when 
a significant scatter is introduced between the inferred and 
true cluster masses. In particular, p{M\M) is modeled as a 
log-normal distribution with a natural logarithmic standard 
deviation a = 0.4. This value of a corresponds to roughly 
a 50 per cent scatter in the inferred mass for a given true 
mass. Even for such a large scatter, the la confidence re- 
gion is virtually unchanged, demonstrating that robust con- 
straints can be obtained even if clusters of a given mass have 
a wide distribution of "richness" values - the only require- 
ment is that there is some monotonic transformation which 
loosely correlates the richness (for instance. X-ray luminos- 
ity or galaxy counts) with the true mass. 

Fig. ^ demonstrates that observations of the cluster cor- 
relation length place constraints on the amplitude & shape of 
the matter power spectrum in the universe which are almost 
independent of cosmology. As one final point, I illustrate the 
type of cosmological constraints which can be obtained when 
these limits are combined with independent observations of 
the mass power spectrum. Fig. ^ shows confidence limits in 
the ilm — as plane from a combination of cluster correla- 
tion length data and cluster number abundance data. The 
dark and light hashed regions show la and 2a confidence 
bands derived from the local cluster temperature function 
by Eke, Cole & Frenk (ECF - 1996). The other confidence 
bands show la and 2a limits derived from the C97 data, un- 
der three different assumptions about the value of F^. The 
solid lines show constraints for the case of a CDM cosmology 
with F = Qmh exp{—QB{l + V2h/Qm)) with h — 0.65, and 
SIb = 0.04 (for conciseness, this region is simply labeled by 
"F = Qrnh" ) . Increases (decreases) in the value of h just shift 
the confidence region to lower (higher) values of fim by the 
same factor. The shaded bands show confidence limits for 
the choices F = 0.23 (which provides a good fit to the power 
spectrum of APM galaxies - see Viana & Liddle 1996) and 
F = 0.1. For F = 0.1 the combined ECF and C97 constraints 
are consistent with the case Qm = 1, while the other choices 
for F each require fim < 0.35 for consistency between the 
datasets at the 2a level. If the LP99 data was used instead 
of C97, even lower values of f2m would be preferred. 



5 CONCLUSION 

I have shown that the halo correlation length measured in 
recent numerical simulations can be well fit (to better than 
8 per cent accuracy) by a semi-analytic model based on 
the collapse of ellipsoidal perturbations due to Sheth, Mo 

6 Tormen (1999). A similar analytic model due to Mo & 
White (1996) on the other hand gives a poorer fit to the 
simulations (roughly 25 per cent accuracy). Applying the 
SMT result to present data, I have shown that correlation 
length observations place strong, almost cosmology indepen- 
dent constraints on the shape and amplitude of the matter 



S Current uncertainties in F should greatly be greatly reduced by 
future surveys such as SDSS, allowing us to place tight constraints 
on Qm using the techniques described here. 
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Figure 4. Confidence limits in the Hm — trg plane derived from 
cluster correlation length data and cluster abundance data. The 
hashed area labeled "ECF" shows la (dark band) and 2(T (light 
band) limits from the cluster abundance analysis of Eke, Cole & 
Prenk (1996). Each of the other three bands shows correlation 
length constraints from the C97 data, under three different as- 
sumptions about the value of F, as labeled. For the F = Q,Ynh 
case I take h=0.65. 



power spectrum in the universe. By far the greatest source 
of uncertainty is systematic discrepancies between current 
datasets, but if the cluster correlation length is at least as 
high as is implied by the APM survey, then an interesting re- 
gion of power spectrum parameter space (everything to the 
right of the "Croft" region in Fig. H) can be excluded. Fu- 
ture surveys, including the Sloan Digital Sky Survey, should 
greatly reduce the observational uncertainties, with 2a confi- 
dence limits from SDSS being as tight as the Icr limits shown 
for the C97 data in Fig. ^. One particularly interesting result 
is that the analysis is almost unaffected when a significant 
scatter (up to 50 per cent) is introduced between true clus- 
ter mass and the richness property by which clusters are 
ranked. Such a scatter is inevitable in any cluster survey, 
so it is extremely useful to know that even an effect this 
large does not influence the results. Finally, I have shown 
how the correlation length can be combined with other clus- 
ter observations to place limits on the matter density of the 
universe. 

A number of previous studies have discussed constraints 
from the cluster correlation length. Bahcall & Cen (1992) 
and Croft & Efstathiou (1994) carried out numerical simu- 
lations showing that cluster correlation length was a strong 
discriminator amoung models, and obtained results which 
are consistent with those discussed in this paper. Mo, Jing 
& White (1996) and later Robinson, Gawiser & Silk (1998) 
made use of the MW formalism to compare models and ob- 
servations for a wider range of parameters, and owing to the 
inaccuracy of the MW model, their results are not consistent 



with those found here (the inconsistency of the MW model 
is actually larger than is apparent in MJW due to an error 
in Fig. 8 of that paper). The work described here improves 
on previous studies by utilizing a formula which accurately 
fits the most recent numerical simulations, and which can 
be used to compute predictions for a large range of model 
parameters very quickly. 

Lastly, it should be noted that the analysis discussed 
here assumes that the primordial fluctuations in the universe 
are Gaussian. Non-gaussianity also influences the value of 
the cluster correlation length, and a number of authors, in- 
cluding Robinson, Gawiser & Silk (1998) and Koyama, Soda 
& Taruya (1999) have attempted to exploit this fact to use 
cluster observations to place constraints on primordial non- 
gaussianity. Both these works have modeled the correlation 
function for non-gaussian models using an extension of the 
MW formalism, and therefore their conclusions should be 
modified in the light the results discussed here, which show 
that MW does not accurately predict the halo correlation 
function even for Gaussian models. 

To summarize, cluster correlation length observations 
place strong cosmology independent constraints on the mat- 
ter power spectrum in the universe, constraints which future 
surveys such as SDSS will allow us to fully exploit. 
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